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(54) Method and system for buffering recognized words during speech recognition 



(57) A method and system for editing words that 
have been misrecognized. The system allows a speaker 
to specify a number of alternative words to be displayed 
in a correction window by resizing the correction win- 
dow. The system also displays the words in the confec- 
tion window in alphabetical order. A preferred system 
eliminates the possibility, when a misrecognized word is 



respoken, that the respoken utterance will be again rec- 
ognized as the same misrecognized word. The system, 
when operating with a word processor, allows the 
speaker to specify the amount of speech that is buffered 
before transferring to the word processor. 
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Description 

TECHNICAL FIELD 

The present invention relates to computer speech 5 
recognition, and more particularly, to the editing of dic- 
tation produced by a speech recognition system. 

BACKGROUND OF THE INVENTION 

A computer speech dictation system that would 
allow a speaker to efficiently dictate and would allow the 
dictation to be automatically recognized has been a 
long-sought goal by developers of computer speech 
systems. The benefits that would result from such a is 
computer speech recognition (CSR) system are sub- 
stantial. For example, rather than typing a document 
into a computer system, a person could simply speak 
the words of the document, and the CSR system would 
recognize the words and store the letters of each word 20 
as if the words had been typed. Since people generally 
can speak faster than type, efficiency would be 
improved. Also, people would no longer need to learn 
how to type. Computers could also be used in many 
applications where their use is currently impracticable 2s 
because a person's hands are occupied with tasks 
other than typing. 

Typical CSR systems have a recognition compo- 
nent and a dictation editing component. The recognition 
component controls the receiving of the series of utter- 30 
ances from a speaker, recognizing each utterance, and 
sending a recognized word for each utterance to the dic- 
tation editing component. The dictation editing compo- 
nent displays the recognized words and allows a user to 
correct words that were misrecognized. For example, 35 
the dictation editing component would allow a user to 
replace a word that was misrecognized by either speak- 
ing the word again or typing the correct word. 

The recognition component typically contains a 
model of an utterance for each word in its vocabulary. 40 
When the recognition component receives a spoken 
utterance, the recognition component compares that 
spoken utterance to the modeled utterance of each 
word in its vocabulary in an attempt to find the modeled 
utterance that most closely matches the spoken utter- 45 
ance. Typical recognition components calculate a prob- 
ability that each modeled utterance matches the spoken 
utterance. Such recognition components send to the 
dictation editing component a list of the words with the 
highest probabilities of matching the spoken utterance, so 
referred to as the recognized word list. 

The dictation editing component generally selects 
the word from the recognized word list with the highest 
probability as the recognized word corresponding to the 
spoken utterance. The dictation editing component then 55 
displays that word. If, however, the displayed word is a 
misrecogn'rtion of the spoken utterance, then the dicta- 
tion editing component allows the speaker to correct the 



misrecognized word. When the speaker indicates to 
correct the misrecognized word, the dictation editing 
component displays a correction window that contains 
the words in the recognized word list. In the event that 
one of the words in the list is the correct word, the 
speaker can just click on that word to effect the correc- 
tion. If, however, the correct word is not in the list, the 
speaker would either speak or type the correct word. 

Some CSR systems serve as a dictation facility for 
word processors. Such a CSR system controls the 
receiving and recognizing of a spoken utterance and 
then sends each character corresponding to the recog- 
nized word to the word processor. Such configurations 
have a disadvantage in that when a speaker attempts to 
correct a word that was previously spoken, the word 
processor does not have access to the recognized word 
list and thus cannot display those words to facilitate cor- 
rection. 

SUMMARY OF THE INVENTION 

The present invention provides a new and improved 
computer speech recognition (CSR) system with a rec- 
ognition component and a dictation editing component. 
The dictation editing component allows for rapid correc- 
tion of misrecognized words. The dictation editing com- 
ponent allows a speaker to select the number of 
alternative words to be displayed in a correction window 
by resizing the correction window. The dictation editing 
component displays the words in the correction window 
in alphabetical order to facilitate locating the correct 
word. In another aspect of the present invention, the 
CSR system eliminates the possibility, when a misrec- 
ognized word or phrase is respoken, that the respoken 
utterance will be again recognized as the same misrec- 
ognized word or phrase based on analysis of both the 
previously spoken utterance and the newly spoken 
utterance. The dictation editing component also allows 
a speaker to specify the amount of speech that is buff- 
ered in a dictation editing component before transferring 
the recognized words to a word processor. The dictation 
editing component also uses a word correction meta- 
phor or a phrase correction metaphor which changes 
editing actions which are normally character-based to 
be either word-based or phrase-based. 

BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1A illustrates a sample resizable correction 
window. 

Figure 1 B illustrates the sample correction window 
after resizing. 

Figure 2A illustrates an adjustable dictation window. 

Figure 2B illustrates the use of a correction window 
to correct text in the dictation window. 

Figures 3A-B illustrate the word/phrase correction 
metaphor for the dictation editing component. 

Figures 4A-C are block diagrams of a computer 
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system of a preferred embodiment. 

Figure 5A is a flow diagram of a dictation editing 
component with a resizable correction window. 

Figure 5B is a flow diagram of a window procedure 
for the resizable correction window. 

Figure 6 is a flow diagram of a dictation editing 
component with an adjustable dictation window. 

Figure 7 is a flow diagram of a window procedure 
for a word processor or dictation editing component that 
implements the word correction metaphor. 

Figure 8 is a flow diagram of a CSR system that 
eliminates misrecognized words from further recogni- 
tion. 

Figure 9 is a flow diagram of automatic recognition 
training. 

DETAILED DESCRIPTION OF THE INVENTION 

The present invention provides for a dictation edit- 
ing component that allows the editing of dictation pro- 
duced by a computer speech recognition (CSR) system. 
In an exemplary embodiment, the dictation editing com- 
ponent allows a speaker to select the number of alterna- 
tive words to be displayed in a correction window by 
resizing the correction window. The dictation editing 
component also displays the words in the correction 
window in alphabetical order. A preferred dictation edit- 
ing component also eliminates the possibility, when a 
misrecognized word is respoken. that the respoken 
utterance will be again recognized as the same misrec- 
ognized word. The dictation editing component, when 
providing recognized words to an application program, 
such as a word processor, preferably allows the speaker 
to specify the amount of speech that is buffered by the 
dictation editing component before transferring recog- 
nized words to the application program. In the following, 
the various aspects of the present invention are 
described when used in conjunction with a discrete 
CSR system {i.e., the speaker pauses between each 
word). These aspects, however, can also be used in 
conjunction with a continuous CSR system. For exam- 
ple, the correction window can be resized to indicate the 
number of alternative phrases to be displayed. Also, 
when a speaker selects a phrase to be replaced, the 
user interface system can ensure that the same phrase 
is not recognized again. 

Figure 1A illustrates a sample resizable correction 
window. The dictation editing component window 101 
contains the recognized words 102 and the correction 
window 103. In this example, the speaker spoke the 
words "I will make the cake." The recognition compo- 
nent misrecognized the word "make" as the word "fake." 
The speaker then indicated that the word "fake" should 
be corrected. Before displaying the correction window, 
the dictation editing component determines the current 
size of the resizable correction window and calculates 
the number of words that could be displayed in that cor- 
rection window. The dictation editing component then 



selects that number of words from the recognized word 
list with the highest probabilities (i.e., alternative words) 
and displays those words in the correction window using 
standard window resizing techniques (e.g., pointing to a 

5 border of the window with a mouse pointer and dragging 
the mouse). If the speaker wishes to see more words 
from the list, the speaker simply resizes the correction 
window. When the correction window is resized, the dic- 
tation editing component again determines the number 

to of words that can be displayed in the correction window 
and displays that number of words in the correction win- 
dow. The next time that the speaker indicates to correct 
a word, the dictation editing component displays the 
correction window with a number of words that will fit 

t5 based on its last resizing. In this way, the speaker can 
effectively select the number of words to be displayed 
by simply resizing the correction window. Figure 1B 
illustrates the sample correction window after resizing. 
Additionally, the dictation editing component prefer- 

20 ably displays the words in the correction window in 
alphabetical order. The displaying of the words in alpha- 
betical order allows the speaker to quickly locate the 
correct word if it is displayed. Prior dictation editing com- 
ponents would display words in correction windows in 

25 an order based on the probability as determined by the 
recognition component. However, when displayed in 
probability order, it may be difficult for a speaker to 
locate the correct word unless the correct word is dis- 
played first or second. 

30 Figure 2A illustrates an adjustable dictation window 
for a CSR system that interfaces with a word processor. 
The CSR system inputs a series of utterances from the 
speaker, recognizes the utterance, and displays recog- 
nized words for the utterances in the dictation window 

35 201 . Since the dictation window is controlled by the dic- 
tation editing component, the speaker can correct the 
words in the dictation window. Thus, when a speaker 
selects to correct a word within the dictation window, the 
speaker can use any of the correction facilities sup- 

40 ported by the dictation editing component. For example, 
the speaker can use the correction window to display 
the words in the recognized word list for any word cur- 
rently displayed in the dictation window. Figure 2B illus- 
trates the use of a correction window to correct text in 

45 the dictation window. 

In one embodiment, the dictation editing compo- 
nent allows a speaker to adjust the amount of speech 
that the dictation window can accommodate. Since the 
speaker can only use the correction facilities on words 

so within the dictation window, but not on words within the 
word processor window, the speaker can adjust the size 
of the dictation window to accommodate the amount of 
speech based on the dictation habits of the speaker. For 
example, the speaker can specify that the dictation win- 

55 dow should only accommodate one sentence, one par- 
agraph, or a fixed number of words. Alternatively, the 
speaker can resize the dictation window using standard 
window resizing techniques to indicate that the dictation 
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window should accommodate as many words as can fit 
into the window. When the dictation window becomes 
full, the CSR system transmits either all of the words or 
some of the words in the dictation window to the word 
processor. For example, if the speaker indicates that the 5 
dictation window should accommodate a sentence, 
then any time a new sentence is started, the CSR sys- 
tem would transmit all of the words {i.e.. one sentence) 
to the word processor. Conversely, if the speaker 
resized the dictation window, then the CS R system may 10 
transmit only a line of words at a time to the word proc- 
essor. 

Figure 3A illustrates the word correction metaphor 
for the dictation editing component. When a word 
processing system is in dictation mode, the dictation is 
editing component automatically changes the definition 
of various editing events (e.g., keyboard events, mouse 
events, pen events, and speech events) to be word- 
based, rather than character-based. For example, when 
in dictation mode, the backspace key, which normally 20 
backspaces one character, is modified to backspace a 
word at a time. Thus, when the user depresses the 
backspace key when in dictation mode, the entire word 
to the left of the current insertion point is deleted. Simi- 
larly, when in dictation mode, the right and left arrow 25 
keys will cause the insertion point to move left or right 
one word, and the delete key will delete the entire word 
to the right of the insertion point. Also, when a user 
clicks with a button of the mouse and the mouse pointer 
is over a word, the dictation editing component selects 30 
the word at which the mouse pointer is over, rather than 
simply setting the insertion point to within the word. 
However, if the mouse pointer is in between words, then 
an insertion point is simply set in between the words. 
Lines 301 -304 illustrate sample effects of the word cor- 35 
rection metaphor. Each line shows the before and after 
text when the indicated event occurs. For example, line 
302 shows that if the insertion point is after the word 
"test," then the left arrow event will cause the insertion 
point to be moved before the word "test." The use of the 40 
word correction metaphor facilitates the correction of 
words when in dictation mode because typically speak- 
ers wish to re-speak the entire word when correcting. 
Thus, when a speaker clicks on a word, the entire word 
is selected and the speaker can simply speak to replace 45 
the selected word. When the speech recognition is con- 
tinuous, a phrase correction metaphor may be prefera- 
ble. Because continuous speech recognition may not 
correctly identify word boundaries, the word correction 
metaphor may select a misrecognized word whose so 
utterance represents only a part of a word or represents 
multiple words. It may be preferable in such situations to 
simply respeak the entire phrase. Consequently, the 
definition of various editing events would be changed to 
be phrase-based, rather than being changed word- 55 
based. For example, the editing event of the user speak- 
ing the word "backspace" that would normally back- 
space oyer the previous character would be changed to 



backspace a phrase at a time. Figure 3B illustrates this 
phrase correction metaphor. 

In one embodiment, the CSR system provides mis- 
recognized word elimination to prevent re-recognition of 
a respoken utterance as the same word that is being 
corrected. The dictation editing component determines 
when a speaker is correcting a misrecognized word. 
The speaker can correct a misrecognized word in differ- 
ent ways. For example, the speaker could delete the 
word and then speak with the insertion point at the loca- 
tion where the word was deleted. Alternatively, the 
speaker could highlight the misrecognized word and 
then speak to replace that highlighted word. When the 
recognition component receives a respoken utterance, 
it recognizes the utterance and sends a new recognized 
word list to the dictation editing component. The dicta- 
tion editing component then selects and displays the 
word from the new recognized word list with the highest 
probability that is other than the word being corrected. 
In one embodiment, the dictation editing component 
uses the previous recognized word list for the misrecog- 
nized utterance and the new recognized word list to 
select a word (other than the word being corrected) that 
has the highest probability of matching both utterances. 
To calculate the highest probability, the dictation editing 
component identifies the words that are in both recog- 
nized word lists and multiplies their probabilities. For 
example the following table illustrates sample recog- 
nized word lists and the corresponding probabilities. 



Previous Recognized 
Word List 


New Recognized Word 
List 


Fake .4 


Fake .4 


Make .3 


Mace .3 


Bake.1 


Make .2 


Mace .1 


Bake.1 



If the speaker spoke the word "make," then without mis- 
recognized word elimination the dictation editing com- 
ponent would select the word "fake" both times since it 
has the highest probability in both lists. With misrecog- 
nized word elimination, the dictation editing component 
selects the word "mace" when the word "fake" is cor- 
rected since the word "mace" has the highest probability 
other than the word "fake" in the current list. However, 
when the probabilities from both recognized word lists 
are combined, the dictation editing component selects 
the word "make" as the correct word since it has the 
highest combined probability. The combined probability 
for the word "make" is .06 (.3 x .2), for the word "mace" 
is .03 (.1 x .3), and for the word "bake" is .01 (.1 x .1). 

The CSR system also automatically adds words to 
its vocabulary and automatically trains. When a user 
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corrects a misrecognized word by typing the correct 
word, the dictation editing component determines 
whether that typed word is in the vocabulary. If the typed 
word is not in the vocabulary, then the dictation editing 
component directs the recognition component to add it 
to the vocabulary using the spoken utterance that was 
misrecognized to train a model for that word. If, how- 
ever, the typed word is in the vocabulary, the dictation 
editing component then automatically directs the recog- 
nition component to train the typed word with the spo- 
ken utterance that was misrecognized. 

The dictation editing component allows for phrase 
correction, in addition to word correction, when used 
with a continuous dictation system. In a continuous dic- 
tation system, the recognition component may incor- 
rectly identify a word boundary. For example, a speaker 
may say the phrase "I want to recognize speech." The 
recognition component may recognize the spoken 
phrase as 1 want to wreck a nice beach." However, the 
use of single word correction does not provide a very 
speaker-friendly way to correct such a misrecognition. If 
the speaker wants to see the alternative words for the 
word "beach." the words "peach," "teach," and maybe 
"speech" may be displayed in the correction window. If 
the speaker wants to see alternative words for the word 
"nice," the words "ice" and "rice" may be displayed and 
for the word "wreck," the words "heck" and "rack." Such 
single word correction will not identify the words "recog- 
nize speech." 

The dictation editing component allows for correc- 
tion of phrases so that misrecognitions resulting from 
incorrect word boundaries can be efficiently corrected. 
When a speaker selects a phrase for correction, the dic- 
tation editing component selects and displays a list of 
alternative phrases. For example, if the speaker selects 
"wreck a nice beach," the alternative phrases may be 
"wreck a nice peach," "rack an ice leach," and "recog- 
nize speech." Also, if the speaker selects "wreck a nice," 
the alternative phrases may be "rack on ice" and "recog- 
niza" 

Furthermore, when a user selects a misrecognized 
phrase for correction, the dictation editing component 
assumes that the current phrase differs from the misrec- 
ognized phrase by more than one word. If only one word 
was incorrect in the misrecognized phrase, then the 
speaker would simply select that misrecognized word 
and not the entire misrecognized phrase. Using this 
assumption, the dictation editing component does not 
display any alternative phrases that differ from the mis- 
recognized phrase by only one word. Continuing with 
the previous example, if the speaker selects "wreck a 
nice beach," then only the alternative phrase "rack an 
ice leach" and "recognize speech" would be displayed. 
Since the alternative phrase "wreck a nice peach" dif- 
fers by only one word, it is not displayed. Additionally, in 
one embodiment, the dictation editing component 
makes the assumption that when a speaker selects a 
phrase for correction, that the misrecognition was the 



result of an incorrectly identified word boundary. In par- 
ticular, if the phrase could be corrected by selecting a 
displayed alternative word, then the speaker would have 
selected those alternative words. Consequently, the dic- 

5 tation editing component would not display any alterna- 
tive phrase that could be corrected by correcting 
individual words from the alternative list. For example, 
the dictation editing component would not display the 
phrase "rack an ice leach" if the words "rack," "an," "ice," 

to and "leach" were alternative words for the correspond- 
ing misrecognized words. 

Figure 4A is a block diagram of a computer system 
of a preferred embodiment. The computer system 400 
contains a memory 401, a central processing unit 402, 

15 an I/O interface unit 403, storage devices 404, a display 
device 405, a keyboard 406, a mouse 407, and a micro- 
phone 408. The memory contains a CSR system com- 
prising a model component 408, a recognition 
component 409, and a dictation editing component 410 

20 and contains an application program 411. The model 
component contains the various model utterances for 
the words in the vocabulary. The recognition component 
receives spoken utterances and accesses the model 
component to generate the recognized word list. The 

25 dictation editing component receives the recognized 
word list and displays the recognized words. The recog- 
nition component, dictation editing component, and 
application program can be interconnected in various 
ways. Figures 4B-4C are block diagrams illustrating var- 

30 ious interconnections of the recognition component, dic- 
tation editing component, and application program. In 
Figure 4B. the recognition component interfaces with an 
application programming interface (API) of the dictation 
editing component, which in turn interfaces with an API 

35 of the application program. In Figure 4C, the recognition 
component interfaces with the APIs provided by the dic- 
tation editing component and the application program. 
Alternatively, the application program could interface 
with APIs provided by the recognition component and 

40 the dictation editing component. 

Figure 5A is a flow diagram of a CSR system with a 
resizable correction window. By resizing the correction 
window, a speaker can indicate the number of words 
from the recognized word list that should be displayed. 

45 In steps 5A01-5A10, the CSR system loops receiving 
utterances that correspond to words, displaying recog- 
nized words, and allowing a speaker to correct the 
words. In step 5A01 , if the speaker is to continue with 
dictation, then the system continues at step 5A02, else 

so the dictation is complete. In step 5A02, the system 
inputs the next utterance from the speaker. In step 
5A03, the system invokes the recognition component to 
recognize the spoken utterance. The recognition com- 
ponent returns the recognized word list with a probabil- 

55 ity that each word in the list corresponds to the spoken 
utterance. In step 5A04, the system selects and dis- 
plays the word with the highest probability from the rec- 
ognized word list. In steps 5A05-5A10, the system loops 
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allow the speaker to correct displayed words. In step 
5A05, H the speaker indicates to correct the displayed 
word, then the system continues at step 5A06, else the 
system loops to step 5A01 to continue with the dictation. 
In step 5A06, the system determines the current size of 5 
the correction window. In step 5A07, the system deter- 
mines the number of words that can fit into the correc- 
tion window based on its current size. In step 5A08. the 
system selects that number of words with the highest 
probability from the recognized word list and displays w 
those words in the correction window. In one embodi- 
ment, the system sorts those selected words alphabeti- 
cally before displaying them. In step 5A09, the system 
receives the correct word from the speaker. In step 
5A1 0, the system replaces the displayed word with the rs 
correct word and loops to step 5A05. 

Figure 5B is a flow diagram of a window procedure 
for the correction window. The window procedure 
receives and controls the processing of all events (i.e.. 
messages) that are directed to the correction window. In 20 
step 5B01 , if a message is received indicating that the 
window is being resized, then the procedure continues 
at step 5B02, else the procedure continues with normal 
processing of other messages. In step 5B02, the proce- 
dure stores the new size of the correction window. In 25 
addition, the procedure may indicate that the CSR sys- 
tem should recalculate the number of words that fit into 
the correction window and redisplay the correction win- 
dow with that number of words. 

Figure 6 is a flow diagram of an adjustable dictation 30 
window processing of a CSR system. The adjustable 
dictation window allows the speaker to specify the 
amount of speech that the dictation window can accom- 
modate. The speaker can then use the correction facili- 
ties of the dictation editing component to correct that 35 
amount of speech that was last spoken. In step 601 , the 
system displays the dictation window. In steps 602-609, 
the system loops processing each unit of speech (e.g., 
sentence or paragraph) and when a unit has been spo- 
ken, it 6ends that unit to the application program. The <o 
unit of speech may also be a line of words when the dic- 
tation window has been resized. In step 602, if the end 
of a speech unit has been received, then the system 
continues at step 610, else the system continues at step 
603. In step 610, the system sends the unit of speech to « 
the application program and continues at step 603. In 
step 603, the speaker indicates that dictation is com- 
plete, then the system is done, else the system contin- 
ues at step 604. In step 604, the system inputs a spoken 
utterance from the speaker. In step 605, the system so 
invokes the recognition component to recognize the 
spoken utterance and to return the recognized word list. 
In step 606, the system saves the recognized word list 
for later correction. In step 607, the system selects and 
displays the word with the highest probability in the rec- 55 
ognized word list. In step 608, if the speaker indicates to 
enter the correction mode, then the system continues at 
step 609, else the system loops to step 602 to deter- 



mine if the end of the speech unit has been reached. In 
step 609. the system allows the speaker to correct any 
of the words within the dictation window. The system, 
when requested by the speaker, displays a correction 
window with words from the saved recognized word 
lists. The system then loops to step 602 to input the next 
utterance. 

Figure 7 is a flow diagram of a window procedure 
for an application program or dictation editing compo- 
nent that implements the word correction metaphor. 
When in dictation mode, the component changes the 
editing behavior to be word-oriented, rather than char- 
acter-oriented. In steps 701-705, the procedure deter- 
mines which message has been received. In step 701, 
if a dictate enable message has been received, then the 
procedure continues at step 701 A, else the procedure 
continues at step 702. In step 701 A, the procedure sets 
the mode to a dictation mode and returns. In step 702, 
if the message is a dictate disable message, then the 
procedure continues at step 702A, else the procedure 
continues at step 703. In step 702A, the procedure sets 
the mode to indicate that data entry is through the key- 
board rather than through dictation, and returns. In step 
703, if the message is a receive character message, 
then the procedure continues at step 703A, else the 
procedure continues at step 704. In step 703A, the pro- 
cedure displays the received character. The character 
may have been received either through a keyboard 
entry or as one of the characters of a recognized word. 
In step 704, if the message is a backspace message, 
then the procedure continues at step 704A, else the 
procedure continues at step 705. In step 704A, if the 
current mode is dictation, then the procedure continues 
at step 704C, else the procedure continues at step 
704B. In step 704C, the procedure backspaces one 
word from the current insertion point. The backspacing 
of one word deletes the word to the left of the insertion 
point and returns. In step 704B, the procedure performs 
the normal backspace of one character and returns. In 
step 705, if the message is a mouse click message, 
then the procedure continues at step 705A, else the 
procedure continues with normal processing. In step 
705A, if the current mode is dictation, then the proce- 
dure continues at step 705C, else the procedure contin- 
ues at step 705B. If step 705C, if the click is within a 
word, then the procedure selects the entire word. Other- 
wise, the procedure sets the insertion point in between 
the words and returns. In step 705B, the procedure sets 
the insertion point as normal and returns. 

Figure 8 is a flow diagram of a dictation editing 
component that eliminates misrecognized words from 
further recognition. The component detects when a 
speaker is speaking to correct a misrecognized word 
and prevents that misrecognized word from being re- 
recognized as the respoken utterance. In step 801 , if the 
dictation is complete, then the component is done, else 
the component continues at step 803. In step 803, the 
component receives a recognized word list from the dic- 
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tation component. In step 804, if the spoken utterance is 
an attempt by the speaker to correct a misrecognized 
word, then the component continues at step 805, else 
the component continues at step 806. In step 805, the 
component selects a word other than the word being 5 
corrected from the recognized word list and continues at 
step 807. In step 806, the component selects the most 
probable word from the recognized word list. In step 

807, the component displays the selected word. In step 

808. if the speaker indicates to enter a correction mode, w 
then the component continues at step 809. else the 
component loops to step 801 to input another utterance. 

In step 809, the component receives the correction for a 
displayed word. In step 810, if the correction was 
entered through the keyboard, then the component con- 1S 
tinues at step 81 1 , else the component loops to step 
801 to select the next input utterance. In step 81 1 , if the 
typed word is already in the vocabulary, then the com- 
ponent continues at step 813, else the component con- 
tinues at step 81 2. In step 81 2, the component adds the 20 
typed word to the vocabulary. In step 813, the compo- 
nent trains the recognition system on the typed in word 
and loops to step 801 to input the next utterance. 

Figure 9 is a flow diagram of a dictation editing 
component that automatically trains the recognition 25 
process. The dictation editing conponent collects utter- 
ances that were misrecognized along with the corrected 
word or phrase. The dictation editing component then 
directs the recognition component to train the recogni- 
tion process to recognize the misrecognized utterances so 
as the corrected word or phrase. This training can be 
performed as each misrecognized utterance is cor- 
rected or the information saved and training performed 
at a later time. In steps 901 -903, the component collects 
misrecognized utterances and the correct word or 35 
phrase. This information can be collected when the 
component detects that a speaker has corrected a word 
or phrase. In step 903, the component determines 
whether the recognizer should be trained. Such training 
can, be performed at times when the computer system 40 
would otherwise be idle or when the accuracy of recog- 
nition is unacceptable. In step 904, the component 
trains the recognizer on the collected utterances. 

Although the present invention has been described 
in terms of a preferred embodiment, it is not intended « 
that the invention be limited to this embodiment. Modifi- 
cations within the spirit of the invention will be apparent 
to those skilled in the art. The scope of the present 
invention is defined by the claims that follow. 
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Claims 

1 . A method in a dictation editing system for buffering 
recognized words before sending to an application 
program, the method comprising: 55 

receiving from a speaker an indication of an 
amount of speech; 



receiving utterances from the speaker; 
recognizing the received utterances as recog- 
nized words; 

displaying the recognized words in a dictation 
window; 

in response to a request from the speaker to 
correct a displayed word, 

displaying a list of alternative words for the 
word to correct; and 

replacing the word to correct with an alter- 
native word from the list; and 

when the indicated amount of speech has been 
recognized and displayed, transferring to the 
application program system words displayed in 
the dictation window. 

2. The method of claim 1 wherein the amount of 
speech is indicated to be a sentence. 

3. The method of claim 1 wherein the amount of 
speech is indicated to be a paragraph. 

4. The method of claim 1 wherein the amount of 
speech is indicated by resizing the dictation win- 
dow. 

5. The method of claim 1 wherein the step of recogniz- 
ing uses continuous speech recognition. 

6. The method of claim 1 wherein the step of recogniz- 
ing uses discrete speech recognition. 

7. The method of claim 1 wherein the application pro- 
gram is a word processor. 

8. A method in a computer system for delaying trans- 
mission of words from a dictation editing system to 
a processing system so that a user can correct any 
words misrecognized, the method comprising: 

receiving from the user an indication of an 

amount of recognized words; 

receiving representations of words; 

recognizing the received representations as 

recognized words; 

displaying the recognized words; 

correcting the displayed words as directed by 

the user; and 

when the indicated amount of recognized 
words have been recognized and displayed, 
transferring to the processing system some of 
the displayed words. 

9. The method of claim 8 wherein the received repre- 
sentations are spoken utterances. 
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10. The method of claim 8 wherein the amount of rec- 
ognized words is indicated to be a sentence. 

1 1 . The method of claim 8 wherein the amount of rec- 
ognized words is indicated to be a paragraph. s 

12. The method of claim 8 wherein the amount of rec- 
ognized words is indicated by resizing a window in 
which the words are displayed. 

10 

1 3. The method of claim 8 wherein the step of recogniz- 
ing uses continuous speech recognition. 

1 4. The method of claim 8 wherein the step of recogniz- 
ing uses discrete speech recognition. is 

15. A computer system for delayed transmission of 
words from a dictation editing system to a process- 
ing system so that a user can correct any words 
misrecognized by the dictation editing system, 20 
comprising: 

means for receiving from the user an indication 
Of an amount of recognized words; 
means for receiving representations of words: 25 
means for recognizing the received representa- 
tions as recognized words; 
means for displaying the recognized words; 
means for correcting the displayed words as 
directed by the user; and 30 
means for transferring to the processing sys- 
tem some of the displayed words when the indi- 
cated amount of recognized words have been 
recognized and displayed. 

35 

16. The computer system of claim 15 wherein the 
received representations are spoken utterances. 

17. The computer system of claim 15 wherein the 
amount of recognized words is indicated to be a *o 
sentence. 

18. The computer system of claim 15 wherein the 
amount of recognized words is indicated to be a 
paragraph. 45 

19. The computer system of claim 15 wherein the 
amount of recognized words is indicated by resizing 
a window in which the words are displayed. 

50 

20. A computer-readable medium containing instruc- 
tions for causing a computer system to delay trans- 
mission of words from a dictation editing system to 
a processing system so that a user can correct any 
words misrecognized, by: 55 

receiving from the user an indication of an 
amount of recognized words; 



receiving spoken utterances from the user; 

recognizing the received spoken utterances as 

recognized words; 

displaying the recognized words; 

correcting the displayed words as directed by 

the user; and 

when the indicated amount of recognized 
words have been recognized and displayed, 
transferring to the processing system a portion 
of the displayed words as corrected. 

21. The computer-readable medium of claim 20 
wherein the amount of recognized words is indi- 
cated to be a sentence. 

22. The computer-readable medium of claim 20 
wherein the amount of recognized words is indi- 
cated to be a paragraph. 

23. The computer-readable medium of claim 20 
wherein the amount of recognized words is indi- 
cated by resizing a window in which the words are 
displayed. 

24. The computer-readable medium of claim 20 
wherein the recognizing uses continuous speech 
recognition. 

25. The computer-readable medium of claim 20 
wherein the recognizing uses discrete speech rec- 
ognition. 
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